Content analytics 'All Abouts'

1

During the Analytics Camp 2010 event last weekend, I attended Manya Mayes presentation on Text Analytics. I noted two 'All Abouts' that affect any content analysis being completed. No, not the Girl Scout Cookies, but what really ensures successful analysis of content. And by content, I mean any non-numeric, non-standard context, which could be in forms such as text (the most common), video, voice, music, etc.

1. It's All About Classification
You can successfully analyze content ONLY when you categorize each 'event' properly.

Manya provided an example in terms of text comment context. An automobile company was getting design input from customers and many women mentioned having a place to place their 'purse'. However other terms such as 'pocketbook', 'handbag' or 'bag' are likely being used by respondents to mean the same. Classifying all of this feedback into a single category is critical to establishing accurate analysis.

I then recalled a presentation at a Las Vegas Premier Business Leadership Series several years ago by Jeffery Ma. He was one of the whiz-kids on the MIT Blackjack Team who joined an analytics company PROTRADE that reviews video content of sports athletes and provides data back to teams to find talent as well as other organizations (tv, gambling, etc). He talked about classification of all this data being a KEY to determining whether the talent was perceived or actual. (Of course, a local example of this data vs. perception is former Duke Basketball star Shane Battier. He has gone on to be a key defender for the NBA Houston Rockets, but because he is a low scorer, some people don't recognize that true value. Read the articles, such as: http://www.nytimes.com/2009/02/15/magazine/15Battier-t.html) Jeffery Ma stated that classifying not just scores, but actions (in basketball that includes such things as blocks, passes, drawn fouls, etc) that the TRUE value and capabilities of athletes is calculable. In order to classify videos, Jeffery hires former coaches and sport experts to help categorize the information seen. (Currently I am unaware of computerized models that would do this accurately and automatically, anyone working on this?)

2. It's All About Severity & Prioritization
Sometimes the frequency of an event doesn't foretell whether it's important enough to work on. You must also consider it's severity. Take for instance, the recent issues with Toyota's brake & acceleration issues. If they were analyzing text comments the number of events was probably very small. However some text could include (yes, I am totally speculating here) terms such as 'accident', 'death', etc. Any of these terms should be marked at a higher severity and prioritized accordingly. So even if only 5 of these existed in the 5mil cars with this braking/acceleration system, a huge RED flag should be presented to the analysts & management.

Share

About Author

Angela Hall

Senior Technical Architect

Angela offers tips on using the SAS Business Intelligence solutions. She manages a team of SAS Fraud Framework implementers within the SAS Solutions On-Demand organization. Angela also has co-written two books, 'Building BI using SAS, Content Development Examples' & 'The 50 Keys to Learning SAS Stored Processes'.

1 Comment

  1. Great points here Angela. I especially like the one about classifying feedback into a single category to establish an accurate analysis. That is exactly what our customers tell us they are looking for when we do their market research. We call them insight "themes" and use our own analytics technology to extract these themes from social media content.

Back to Top